# Zero-shot Learning
Magma 8B GGUF
MIT
Magma-8B is an image-text-to-text conversion model based on the GGUF format, suitable for multimodal task processing.
Image-to-Text
M
Mungert
545
1
Arshgpt
MIT
Transformers is an open-source library developed by Hugging Face, providing various pretrained models for natural language processing tasks.
Large Language Model
Transformers

A
arshiaafshani
69
5
Openvision Vit Small Patch16 224
Apache-2.0
OpenVision is a fully open, cost-effective family of advanced vision encoders focused on multimodal learning.
Image Enhancement
O
UCSC-VLAA
17
0
Bart Large Empathetic Dialogues
This model is based on the transformers library, and its specific purpose and functionality require further information to determine.
Large Language Model
Transformers

B
sourname
199
1
Falcon H1 1.5B Deep Base
Other
Falcon-H1 is an efficient hybrid architecture language model developed by TII, combining Transformer and Mamba architectures to support multilingual tasks
Large Language Model
Transformers Supports Multiple Languages

F
tiiuae
194
3
Openbioner Base
MIT
OpenBioNER is a lightweight BERT model specifically designed for open-domain biomedical named entity recognition (NER). It can identify unseen entity types using only natural language descriptions of target entity types, without requiring retraining.
Sequence Labeling English
O
disi-unibo-nlp
210
1
Xglm 564M
MIT
XGLM-564M is a multilingual autoregressive language model with 564 million parameters, trained on a balanced corpus of 30 languages totaling 500 billion subwords.
Large Language Model Supports Multiple Languages
X
facebook
11.13k
51
Zero Mistral 24B
MIT
Zero-Mistral-24B is an improved text-only model based on Mistral-Small-3.1-24B-Instruct-2503, primarily adapted for Russian and English, with the original visual capabilities removed to focus on text generation tasks.
Large Language Model
Transformers Supports Multiple Languages

Z
ZeroAgency
41
2
Orpo Med V3
Apache-2.0
This is a transformers model hosted on Hugging Face Hub. Specific functions and uses require further information.
Large Language Model
Transformers

O
Jayant9928
2,852
3
Xlm Roberta Large Pooled Cap Minor
MIT
A multilingual text classification model fine-tuned based on xlm-roberta-large, used for minor topic code classification in comparative agenda projects
Text Classification
PyTorch Other
X
poltextlab
61
0
Sam Vit Base
MIT
This is an improved version of the Facebook SAM model (sam-vit-base), specifically optimized for image segmentation tasks in CVAT.
Image Segmentation Supports Multiple Languages
S
sajabdoli
184
0
Quantum STT
Apache-2.0
Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.
Speech Recognition
Transformers Supports Multiple Languages

Q
sbapan41
100
1
Kok Basev2
Apache-2.0
Kok-Base is a multilingual model supporting English, Arabic, and Czech, suitable for various natural language processing tasks.
Large Language Model
Transformers Supports Multiple Languages

K
moelanoby
195
1
Internvl2 5 HiMTok 8B
Apache-2.0
HiMTok is a hierarchical mask token learning framework fine-tuned on the InternVL2_5-8B large multimodal model, focusing on image segmentation tasks.
Image-to-Text
I
yayafengzi
16
3
Distill Any Depth Small Hf
MIT
Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.
3D Vision
Transformers

D
xingyang1
1,214
3
Illumiyume Anime Style Noobai Xl Nai Xl V10 Sdxl
Other
An anime-style text-to-image generation model based on Stable Diffusion XL, focusing on high-quality anime character creation
Image Generation English
I
John6666
5,080
1
Allenai.olmocr 7B 0225 Preview GGUF
olmOCR-7B-0225-preview is an image-to-text model based on OCR technology developed by AllenAI, designed to extract and recognize text content from images.
Large Language Model
A
DevQuasar
239
1
Llava NeXT Video 7B Hf
LLaVA-NeXT-Video-7B-hf is a video-based multimodal model capable of processing video and text inputs to generate text outputs.
Video-to-Text English
L
FriendliAI
30
0
Qwen2.5 Dyanka 7B Preview
Apache-2.0
A 7B-parameter language model based on the Qwen2.5 architecture, created by fusing multiple pre-trained models using the TIES method
Large Language Model
Transformers

Q
Xiaojian9992024
1,497
8
Vit So400m Patch16 Siglip 384.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Text-to-Image
Transformers

V
timm
2,073
0
Vit So400m Patch14 Siglip 378.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2, designed for image feature extraction, trained on the webli dataset
Text-to-Image
Transformers

V
timm
30
0
Vit Large Patch16 Siglip Gap 384.v2 Webli
Apache-2.0
A vision Transformer model based on the SigLIP 2 architecture, featuring a Global Average Pooling (GAP) variant that removes the attention pooling head, suitable for image feature extraction tasks.
Text-to-Image
Transformers

V
timm
95
0
Vit Large Patch16 Siglip 512.v2 Webli
Apache-2.0
ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks
Image Classification
Transformers

V
timm
295
0
Vit Giantopt Patch16 Siglip Gap 256.v2 Webli
Apache-2.0
SigLIP 2 ViT image encoder, using global average pooling, with attention pooling head removed, designed specifically for timm
Image Classification
Transformers

V
timm
17
0
Vit Giantopt Patch16 Siglip 256.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2 technology, focused on image feature extraction
Text-to-Image
Transformers

V
timm
59
0
Vit Base Patch32 Siglip 256.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction
Text-to-Image
Transformers

V
timm
27
0
Vit Base Patch16 Siglip Gap 512.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, using global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.
Image Classification
Transformers

V
timm
105
0
Vit Base Patch16 Siglip 512.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Text-to-Image
Transformers

V
timm
2,664
0
Vit So400m Patch16 Siglip Gap 512.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, utilizing global average pooling, suitable for vision-language tasks.
Text-to-Image
Transformers

V
timm
21
0
Qwen2.5 14B CIC ACLARC GGUF
Apache-2.0
This is a quantized version based on the Qwen2.5-14B-Instruct model, specifically designed for citation intent classification tasks.
Large Language Model English
Q
sknow-lab
42
1
Qwen2.5 14B CIC SciCite GGUF
Apache-2.0
A citation intent classification model fine-tuned based on Qwen2.5-14B-Instruct, specializing in citation analysis tasks in scientific literature.
Large Language Model English
Q
sknow-lab
57
1
Gliner Biomed Bi Large V1.0
Apache-2.0
GLiNER-BioMed is an efficient open NER model suite based on the GLiNER framework, specifically designed for the biomedical domain to recognize various types of biomedical entities.
Sequence Labeling English
G
Ihor
56
1
Gliner Biomed Bi Base V1.0
Apache-2.0
GLiNER-BioMed is an efficient open biomedical named entity recognition model suite based on the GLiNER framework, specifically designed for the biomedical domain, capable of recognizing multiple entity types.
Sequence Labeling English
G
Ihor
25
1
Healthgpt L14
MIT
HealthGPT is a model specifically developed for unified multimodal medical tasks.
Large Language Model English
H
lintw
43
7
Cuckoo C4
MIT
Cuckoo is a small (300M parameters) information extraction model that efficiently extracts information by mimicking the next-word prediction paradigm of large language models
Large Language Model
Transformers

C
KomeijiForce
15
1
ENEL
ENEL is a model exploring the potential of encoder-free architecture in 3D large multimodal models.
Image-to-Text
E
IvanTang
17
1
Pi0
Apache-2.0
Pi0 is a general robot control model based on vision-language-action flow, supporting robot control tasks.
Multimodal Fusion
P
lerobot
11.84k
230
Felguk Suno Or People
Apache-2.0
This model is used to classify audio clips as either 'Suno' music or 'People' music.
Audio Classification
Transformers Supports Multiple Languages

F
Felguk
58
1
Internlm3 8b Instruct
Apache-2.0
InternLM3-8B-Instruct is an 8-billion-parameter instruction model developed by Shanghai AI Laboratory, designed for general-purpose use and advanced reasoning, featuring high efficiency and low cost.
Large Language Model
I
internlm
53.04k
217
Sam2 Hiera Small.fb R896
Apache-2.0
SAM2 model based on the HieraDet image encoder, focused on image feature extraction tasks.
Image Segmentation
Transformers

S
timm
142
0
- 1
- 2
- 3
- 4
- 5
- 6
Featured Recommended AI Models